Introduction to data science in R
Lesson 4: For loops


Brian S. Evans, Ph.D.
Migratory Bird Center
Smithsonian Conservation Biology Institute


Setup for the lesson


# Load RCurl library:

library(RCurl)

# Load a source script:

script <-
  getURL(
    "https://raw.githubusercontent.com/bsevansunc/workshop_languageOfR/master/sourceCode.R"
  )

# Evaluate then remove the source script:

eval(parse(text = script))

rm(script)

For loops


Why would you use for loops?

# Filter irisTbl to setosa:

irisTbl[irisTbl$species == 'setosa', ]

# Extract the petalLength field (column):

irisTbl[irisTbl$species == 'setosa', ]$petalLength

# Calculate the mean of petal lengths:

mean(irisTbl[irisTbl$species == 'setosa', ]$petalLength)

Exercise One:


Calculate the mean petal length of each of the Iris species using matrix notation (as above) and a custom function.


Exercise One:


Calculate the mean petal length of each of the Iris species using matrix notation (as above) and a custom function.

# Mean petal lengths, matrix notation:

mean(irisTbl[irisTbl$species == 'setosa', ]$petalLength)
mean(irisTbl[irisTbl$species == 'versicolor', ]$petalLength)
mean(irisTbl[irisTbl$species == 'virginica', ]$petalLength)

# Mean petal lengths, function method:

meanPetalFun <- function(spp){
  mean(irisTbl[irisTbl$species == spp, ]$petalLength)
}

meanPetalFun('setosa')
meanPetalFun('versicolor')
meanPetalFun('virginica')

Indexing review, vectors


Consider the following numeric vector, v:


[1] [2] [3] [4] [5]
1 1 2 3 5

Indexing review, vectors



[1] [2] [3] [4] [5]
1 1 2 3 5

Vector v is an R object comprised of five numbers.

# Explore vector v:

v

class(v)

str(v)

length(v)

Indexing review, vectors


[1] [2] [3] [4] [5]
1 1 2 3 5

Each value in a vector has a position, denoted by “[i]”.

Recall: v[i] is the value of v at position i.

# Explore vector v using indexing:

i <- 3

v[i]

v[3]

v[3] == v[i]

Indexing review, vectors



\[V_{new, i} = V_{i} + 1\]

Each value in a vector has a position, denoted by “[i]”.

Recall: v[i] is the value of v at position i.

# Add 1 to the value of v at position three:

i <- 3

v[3] + 1

v[i] + 1

For loops, simple example



\[V_{new, i} = V_{i} + 1\]

Writing proper for loops requires following these three steps:

  1. Output: Always define an object for storing output (e.g., an empty vector, matrix, or list)
  2. Sequence: The locations for which the loop will run
  3. Body: This is the instructions for what will occur during each iteration of the loop

For loop, output:



\[V_{new, i} = V_{i} + 1\]

ALWAYS specify an object to store your output!

Vector objects are defined as:

# Define a vector for output:

vNew <- vector('numeric', length = length(v))

str(vNew)

For loop, output


ALWAYS specify an object to store your output!

# Explore filling values of vNew by index:

i <- 3

v[i]

vNew[i] <- v[i] + 1

vNew[i]

v[i] + 1 == vNew[i]

For loop, sequence


The sequence can be defined as:

v

1:5

1:length(v)

seq_along(v)

# Example for loop sequence statements:

# for(i in 1:length(v))
  
# for(i in seq_along(v))

For loop, body


The for loop body describes what will happen at each iteration of the loop. For example:

i <- 3

vNew[i] <- v[i] + 1

For loop, putting it together


  1. Output
  2. Sequence
  3. Body
# For loop output:

vNew <- numeric(length = length(v))

# For loop sequence:

for(i in seq_along(v)){
  # For loop body:
  vNew[i] <- v[i] + 1
}

# Explore first for loop output:

vNew

v

vNew == v + 1

Subsetting with for loops


Split-Apply-Combine

# Mean petal lengths of Iris species without a for loop:

mean(irisTbl[irisTbl$species == 'setosa', ]$petalLength)

mean(irisTbl[irisTbl$species == 'versicolor', ]$petalLength)

mean(irisTbl[irisTbl$species == 'virginica', ]$petalLength)

Subsetting with for loops


Split-Apply-Combine


Start by creating a vector of species:

# Make a vector of species to loop across:

irisSpecies <- levels(irisTbl$species)

irisSpecies

Subsetting with for loops


Split-Apply-Combine


Create an empty vector to store our output:

# For loop output statement:

petalLengths <- vector('numeric',length = length(irisSpecies))

petalLengths

Subsetting with for loops


Split-Apply-Combine


Split: The for loop body, starts with splitting the data

# Exploring the iris data, subsetting by species:

i <- 3

irisSpecies[i]

irisTbl[irisTbl$species == irisSpecies[i], ]

# Split:

iris_sppSubset <- irisTbl[irisTbl$species == irisSpecies[i], ]

Subsetting with for loops


Split-Apply-Combine


Apply: Modification of the data:

# Calculate mean petal length of each subset:

mean(iris_sppSubset$petalLength)

Subsetting with for loops


Split-Apply-Combine

# Make a vector of species to loop across:

irisSpecies <- levels(irisTbl$species)

# For loop output statement:

petalLengths <- vector('numeric',length = length(irisSpecies))

# For loop:

for(i in seq_along(irisSpecies)){
  # Split:
  iris_sppSubset <- irisTbl[irisTbl$species == irisSpecies[i], ]
  # Apply:
  petalLengths[i] <- mean(iris_sppSubset$petalLength)
}

Subsetting with for loops


Split-Apply-Combine


Combine: Combining the for loop output

# Make a tibble data frame of the for loop output:

petalLengthFrame <- data_frame(species = irisSpecies, count = petalLengths)

petalLengthFrame

Exercise Two:


Use a for loop and the birdHabits data frame to calculate the number species in each diet guild.


Exercise Two:


Use a for loop and the birdHabits data frame to calculate the number species in each diet guild.


birdHabits

diets <- unique(birdHabits$diet)

outVector <- vector('numeric', length = length(diets))

for(i in seq_along(outVector)){
  # Split:
  dietSubset <- birdHabits[birdHabits$diet == diets[i],]
  # Apply:
  outVector[i] <- nrow(dietSubset)
}

# Combine: 
data_frame(diet = diets, nSpecies = outVector)

For loops across data objects


For loops can be used to explore data objects with common features.

How many omnivorous birds were observed at each site?

# Explore the bird count data:

head(birdCounts)

str(birdCounts)

# Explore the bird trait data:

head(birdHabits)

str(birdHabits)

For loops across data objects


How many omnivorous birds were observed at each site?

Get a vector of birds that are ground foragers from the birdHabits data frame:

# Extract vector of omnivorous species:

omnivores <- birdHabits[birdHabits$diet == 'omnivore',]$species

For loops across data objects


How many omnivorous birds were observed at each site?

Split the data into individual sites.

# Generate a vector of unique sites:

sites <- unique(birdCounts$site)

# Site at position i:

i <- 3

sites[i]

# Subset data:

birdCounts_siteSubset <- birdCounts[birdCounts$site == sites[i],]

birdCounts_siteSubset

For loops across data objects


How many omnivorous birds were observed at each site?

Split: Use %in% to extract only records associated with omnivores and sum the count field.


# Just a vector of omnivore counts:

countVector <-
  birdCounts_siteSubset[birdCounts_siteSubset$species %in%
  omnivores,]$count

For loops across data objects


How many omnivorous birds were observed at each site?

Apply: Sum the count vector.


# Get total number of omnivores at the site:

nOmnivores <- sum(countVector)

For loops across data objects


How many omnivorous birds were observed at each site?

Combine: Values combined using the vector method

sites <- unique(birdCounts$site)

outVector <- vector('numeric', length = length(unique(sites)))

for(i in seq_along(sites)){
  birdCounts_siteSubset <- birdCounts[birdCounts$site == sites[i],]
  countVector <-
    birdCounts_siteSubset[birdCounts_siteSubset$species %in%
    omnivores, ]$count
  outVector[i] <- sum(countVector)
}

# Combine:

data_frame(site = sites, nOmnivores = outVector)

For loops across data objects


How many omnivorous birds were observed at each site?

Combine: Values combined using the list method

sites <- unique(birdCounts$site)

outList <- vector('list', length = length(unique(sites)))

for(i in seq_along(sites)){
  birdCounts_siteSubset <- birdCounts[birdCounts$site == sites[i],]
  countVector <-
    birdCounts_siteSubset[birdCounts_siteSubset$species %in%
    omnivores,]$count
  outList[[i]] <- data_frame(
    site = sites[i],
    nOmnivores = sum(countVector))
}

# Combine:

bind_rows(outList)

Simulation with for loops


For loop to generate a vector of numbers based on some mathematical function. For example:


\[n_t = 2(n_{t-1})\]

Simulation with for loops


For loop to generate a vector of numbers based on some mathematical function. For example:


\[n_t = 2(n_{t-1})\]

# For loop output:


n <- vector('numeric', length = 5)

n

# Set the seed value:

n[1] <- 10

n

Simulation with for loops


For loop to generate a vector of numbers based on some mathematical function. For example:


\[n_t = 2(n_{t-1})\]

# For loop sequence:

# for(i in 2:length(n))

Simulation with for loops


For loop to generate a vector of numbers based on some mathematical function. For example:


\[n_t = 2(n_{t-1})\]

Body: For each iteration (example, position 2):

# Exploring the construction of the for loop body:

i <- 2

n[i]

n[i-1]

n[i] <- 2*n[i-1]

n

Simulation with for loops


For loop to generate a vector of numbers based on some mathematical function. For example:


\[n_t = 2(n_{t-1})\]

# Output:

n <- vector('numeric', length = 5)

# Seed:

n[1] <- 10

# For loop:

for(i in 2:5){
 n[i] = n*v[i-1]
}

Exercise Three:


picture of a rabbit

One of my favorite for loops was created by Leonardo Bonacci (Fibonacci). He created the first known population model, from which the famous Fibonacci number series was created. He described a population (N) of rabbits at time t as the sum of the population at the previous time step plus the time step before that:

\[N_t = N_{t-1} + N_{t-2}\]
  1. Create an output vector of 20 numeric values.
  2. Seed the vector with the first two values, 0 and 1.
  3. Use the formula above and your seed vector to generate the first 20 numbers of the Fibonacci number sequence.